CS267 Spring 2016Practice Final

Briefly explain the QUIT and CONTINUE accumulator pruning strategies for Term-at-a-Time Query Processing.
Express the following using our operators for GC-Lists. The contents of the body of any html document whose head contains at least two meta tags.
Briefly explain the following coding schemes and what they are used for: Canonical Huffman Code, LLRUN.
Consider the list `L=langle 5,8,15,25 rangle`. Compute its `Delta`-list, then encode it using a `gamma`-code.
Define the REBUILD and REMERGE batch update operations. Give a concrete (non-trivial, i.e., some work is done) situation in which they would both have the same performance.
Describe how the Logarithmic Merge Dynamic Index algorithm works.
Suppose there is a 1,000,000 word corpus. Document `d` is 300 words long. The terms blue and bayou occur 1000, 300 times respectively. In document `d` blue occurs twice and bayou once. Let `mu=1000`. What is the LMD score for `d` for the query "blue bayou"?
Derive how the factor `P_1` is estimated in the DFR formula.
Briefly explain how query documents are returned in the parallel setting when query processing with term partitioning is done.
Describe how the HITS algorithm measures the quality of a document. How does SALSA differ from HITS?